Skip to content

Conversation

@mjnovice
Copy link
Contributor

@mjnovice mjnovice commented Oct 17, 2025

(uipath-python) (kind-kind-cloudgen/default) 
 ➜  calculator git:(mj/wire-trajectory)  uipath eval main.py evals/eval-sets/default.json --no-report
→ Running Evaluations
  ○ Add - Running...
HTTP Request: GET https://alpha.uipath.com/uipathmj/DefaultTenant/orchestrator_/llm/api/capabilities "HTTP/1.1 500 Internal Server Error"
HTTP Request: GET https://alpha.uipath.com/uipathmj/DefaultTenant/agenthub_/llm/api/capabilities "HTTP/1.1 200 OK"
HTTP Request: POST https://alpha.uipath.com/uipathmj/DefaultTenant/agenthub_/llm/api/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 200 OK"
HTTP Request: POST https://alpha.uipath.com/uipathmj/DefaultTenant/agenthub_/llm/api/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 200 OK"
HTTP Request: POST https://alpha.uipath.com/uipathmj/DefaultTenant/agenthub_/llm/api/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 200 OK"
▌ Add
  JsonSimilarityEvaluator                        1.0  
  TrajectoryEvaluator                            0.0  
  ExactMatchEvaluator                            1.0  
  LLMJudgeOutputEvaluator                        1.0  
  LLMJudgeStrictJSONSimilarityOutputEvaluator    1.0  
  ContainsEvaluator                              1.0  
─────────────────────────────────────────────────────────────────────────────────────── Execution Logs: Add ────────────────────────────────────────────────────────────────────────────────────────
  No execution logs
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  ○ Multiply - Running...
HTTP Request: POST https://alpha.uipath.com/uipathmj/DefaultTenant/agenthub_/llm/api/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 200 OK"
HTTP Request: POST https://alpha.uipath.com/uipathmj/DefaultTenant/agenthub_/llm/api/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 200 OK"
HTTP Request: POST https://alpha.uipath.com/uipathmj/DefaultTenant/agenthub_/llm/api/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 200 OK"
▌ Multiply
  JsonSimilarityEvaluator                        1.0  
  TrajectoryEvaluator                            0.0  
  ExactMatchEvaluator                            1.0  
  LLMJudgeOutputEvaluator                        1.0  
  LLMJudgeStrictJSONSimilarityOutputEvaluator    1.0  
  ContainsEvaluator                              1.0  
───────────────────────────────────────────────────────────────────────────────────── Execution Logs: Multiply ─────────────────────────────────────────────────────────────────────────────────────
  No execution logs
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  ○ Skip denial code check - Running...
  ✓ Skip denial code check - No evaluators
────────────────────────────────────────────────────────────────────────────── Execution Logs: Skip denial code check ──────────────────────────────────────────────────────────────────────────────
  No execution logs
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Evaluation Results
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃  Evaluation              ┃  JsonSimilarityEvaluator  ┃  TrajectoryEvaluator  ┃  ExactMatchEvaluator  ┃  LLMJudgeOutputEvaluator  ┃  LLMJudgeStrictJSONSimilarityOutputEv…  ┃  ContainsEvaluator  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│  Add                     │                      1.0  │                  0.0  │                  1.0  │                      1.0  │                                    1.0  │                1.0  │
│  Multiply                │                      1.0  │                  0.0  │                  1.0  │                      1.0  │                                    1.0  │                1.0  │
│  Skip denial code check  │                        -  │                    -  │                    -  │                        -  │                                      -  │                  -  │
├──────────────────────────┼───────────────────────────┼───────────────────────┼───────────────────────┼───────────────────────────┼─────────────────────────────────────────┼─────────────────────┤
│  Average                 │                      1.0  │                  0.0  │                  1.0  │                      1.0  │                                    1.0  │                1.0  │
└──────────────────────────┴───────────────────────────┴───────────────────────┴───────────────────────┴───────────────────────────┴─────────────────────────────────────────┴─────────────────────┘
(uipath-python) (kind-kind-cloudgen/default) 

tested with the calculator agent for a dummy run of the trajectory eval

@github-actions github-actions bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Oct 17, 2025
@mjnovice mjnovice force-pushed the mj/wire-trajectory branch 2 times, most recently from 748fba3 to 69ca78f Compare October 18, 2025 09:28
@mjnovice mjnovice force-pushed the mj/wire-trajectory branch 4 times, most recently from 1936c1f to e6f0524 Compare October 20, 2025 10:56
@mjnovice mjnovice merged commit 4553f34 into release/revamped-evals Oct 20, 2025
24 of 40 checks passed
@mjnovice mjnovice deleted the mj/wire-trajectory branch October 20, 2025 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants